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HYBRIDIZATION AND SEQUENCING OF NUCLEIC ACIDS 



GOVERNMENT RIGHTS 
The invention described herein arose in the course of or 
under Contract No. DE-FG03-92ER81275 (Grant No. 21012-92-11) between 
the Department of Energy and Affymax; and in the course of or under 
NIH Contract No. 1R01HG00813-01 . 

BACKGROUND OF THE INVENTION 
The present invention relates to the field of nucleic 
acid analysis, detection, and sequencing. More specifically, in 
one embodiment the invention provides improved techniques for 
synthesizing arrays of nucleic acids, hybridizing nucleic acids, 
detecting mismatches in a double -stranded nucleic acid composed of a 
single-stranded probe and a target nucleic acid, and determining the • 
sequence of DNA or RNA or other polymers . 

It is important in many fields to determine the sequence 
of nucleic acids because, for example, nucleic acids encode the 
enzymes, structural proteins, and other effectors of biological 
functions. In addition to segments of nucleic acids that encode 
polypeptides, there are many nucleic acid sequences involved in 
control and regulation of gene expression. 

The human genome project is one example of a project 
using nucleic acid sequencing techniques. This project is directed 
toward determining the complete sequence of the genome of the human 
organism. Although such a sequence would not necessarily correspond 
to the sequence of any specific individual, it will provide 
significant information as to the general organization and specific 
sequences contained within genomic segments from particular 
individuals. The human genome project will also provide mapping 
information useful for further detailed studies. 

The need for highly rapid, accurate, and inexpensive 
sequencing technology is nowhere more apparent than in a demanding 
sequencing project such as the human genome project. To complete the 
sequencing of a human genome will require the determination of 
approximately 3x10^, or 3 billion, base pairs. 

The procedures typically used today for sequencing 
include the methods described in Sanger et ai., Proc. Natl. Acad. 
Sci . USA (1977) 74:5463-5467, and Maxam et al . , Methods in Enzvmoloqv 
(1980) £5:499-559. The Sanger method utilizes enzymatic elongation 
with chain terminating dideoxy nucleotides. The Maxam and Gilbert 
method uses chemical reactions exhibiting base-specific cleavage 
reactions. Both methods require a large number of complex 
manipulations, such as isolation of homogeneous DNA fragments. 



elaborate and tedious preparation of samples, preparation of a 
separating gel, application of samples to the. gel, electrophoresing 
the samples on the gel, working up of the finished gel, and analysis 
of the results of the procedure. 

Alternative techniques have been proposed for sequencing 
a nucleic acid. PCT patent Publication No. 92/10588, incorporated 
herein by reference for all purposes, describes one improved 
technique in which the sequence of a labeled, target nucleic acid is 
determined by hybridization to an array of nucleic acid probes on a 
substrate. Each probe is located at a positionally distinguishable 
location on the substrate. When the labeled target is exposed to the 
substrate, it binds at locations that contain complementary 
nucleotide sequences. Through knowledge of the sequence of the 
probes at the binding locations, one can determine the nucleotide 
sequence of the target nucleic acid. Th&^ technique is particularly 
efficient when very large arrays ofCnuleip^cid probes are utilized. 
Such arrays can be formed according to the techniques described in 
U.S. Patent No. 5,143,854 issued to Pirrung et al,. See also U.S. 
application Serial No. 07/805,727, both incorporated herein by 
r©£firence for all purposes. 

*S the nucleic acid probes are of a length shorter than 

the targ^ , on e\can employ a reconstruction technique to determine 
the sequence of ohe larger target based on affinity data from the 
shorter probes. Sfete U.S. Patent No. 5,202,231 to Drmanac et al,. , 
and PCT patent Publftation No. 89/10977 to Southern. One technique 
for overcoming this difficulty has been termed sequencing by 
hybridization or SBH. Vor example, assume that a 12-mer target DNA 
5 ' -AGCCTAGCTGAA is mixedNwith an .array of all octanucleot ide probes. 
If the target binds only Ua those probes having an exactly 
complementary nucleotide sequence, only five of the 65,536 octamer 
probe S ( 3 ' - TCGGATCG , CGGATCGAK GGATCGAC , GATCGACT , and ATCGACTT ) will 
hybridize to the target. AligrWnt of the overlapping sequences from 
the hybridizing probes reconstructs the complement of the original 
12-mer target: \ 

TCGGATCG \ 
CGGATCGA \ 
GGATCGAC >i 
GATCGACT \ 
ATCGACTT . \ 
TCGGATCGACTT \ 

While meeting with much optimism, prior techniques have 
also met with certain limitations. For example, practitioners have 
encountered substantial difficulty in analyzing probe arrays 



hybridized to a target nucleic acid due to the hybridization of 
partially mismatched sequences, among other difficulties . The 
present invention provides significant advances in sequencing with 
such arrays . 

SUMMARY OF THE INVENTION 
improved techniques for synthesizing, hybridizing, 
analyzing, and sequencing nucleic acids (oligonucleotides) are 
provided by the present invention. 

According to one embodiment of the invention, a target 
oligonucleotide is exposed to a large number of immobilized probes 
of Shorter length. The probes are collectively referred to as an 
-array " In the method, one identifies whether a target nucleic acid 
is complementary to a probe in the array by identifying first a core 
probe having high affinity to the target, and then evaluating the 
binding characteristics of all probes with a single base mismatch as 
compared to the core probe. If the single base mismatch probes 
exhibit a characteristic binding or affinity pattern, then the core 
probe is exactly complementary to at least a portion of, the target 
nucleic acid. 

The method can be extended to sequence a target nucleic 
acid larger than any probe in the array by evaluating the binding 
affinity of probes that can be termed "left- and -right" extensions 
of the core probe. The correct left and right extensions of the 
core are those that exhibit the strongest binding affinity and/or 
a specific hybridization pattern of single base mismatch probes. 
The binding affinity characteristics of single base mismatch probes^ 
follow a Characteristic pattern in which probe/target complexes with 
mismatches on the 3' or 5' termini are more stable than probe/target 
complexes with internal mismatches. The process is then repeated to 
determine additional left and right extensions of the core probe to 
provide the sequence of a nucleic acid target. 

in some embodiments, such as in diagnostics-; a target is 
expected to have a particular sequence. To determine if the target 
has the expected sequence, an array of probes is synthesized that 
includes a complementary probe and all or some subset of all single 
base mismatch probes. Through analysis of the hybridization pattern 
of the target to such probes, it can be determined if the target has 
the expected sequence and, if not. the sequence of the target may 

optionally be determined. 

Kits for analysis of nucleic acid targets are also 
provided by virtue of the present invention. According to one 
embodiment, a kit includes an array of nucleic acid probes. The 
probes may include a perfect complement to a target nucleic acid. 
The probes also include probes that are single base substitutions of 
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i_ t-ka v-ii- mav include one or more of the 
the perfect complement probe. The kit may inciuae o ^ c ^v, 

A c T G and/or U substitutions of the perfect complement. Such 

kits'will have a variety of uses, including analysis of targets for 
a particular genetic sequence, such as in analysis for genetxc 

diseases. ^ ^^^^^^^ understanding of the nature and advantages of 
the inventions herein may be realized by reference to the remaining 
portions of the specification and the attached drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig 1 illustrates light-directed synthesis of 
oligonucleotides. A surface (2) Rearing photoprotected hydroxyls 
°ox! is illuminated through a photolithographic masR (M,) generating 
frel hydroxyls (OH) in the photodeprotected regions. The hydroxyl 
groups are then coupled to a 5' -photoprotected 

phosphoramidite (e.g.. T-X) . A new mask (M^) is used to illuminate 
a new pattern on the surface, and a second photoprotected^ 
a new pa c is then coupled. Rounds of illumination 

phosphoramidite (e.g., C-X) is tnen coup , ,. ^i^^^^h^ 

J .-V.O r?*>c5-irGd set of oligonucleotide 

and couoling are repeated until the desirea set <-.i. 

pries is Obtained. A target (R) is exposed to the ^--^^/^ 
optionally with a'label (-) - The location(s) where the target binds 
to the array is used to determine the sequence of the target; 

Fig 2 illustrates hybridization and thermal dissociation 
Of Oligonucleotides, showing a fluorescence scan of - "^^^^ """'"^^ 
acid (5'-GCGTAGGC-fluorescein) hybridized to an array of probes. 

«^ a vc^ies Axioscop 20 microscope 

substrate surface was scanned with a Zeiss Axioscop 

• « n=>c=oT- i=>:citation The fluorescence emission 
ncsina 488 nm argon ion laser excicaciou. 

abo^e S20 nm was detected, using a cooled photomultiplier (Hamamatsu 
934-02) operated in photon counting mode. The signal intensity is 
indicated on the scale shown to the right of the image. The 
temperature is indicated to the right of each panel in »C; 

Fig 3 illustrates the sequence specificity ot 
hybridization. '(A) is an index of the probe composition at each 
synthesis site. 3'-CGCATCCG surface immobilized probe (referred 
to herein as s-3 ' -CGCATCCG) was synthesized in stripes 1. 3. and 5. 
and the probe s-3 ' -CGCTTCCG was synthesized in stripes 2, 4, and 6. 
(B) is a fluorescence image showing hybridization of the substrate 
■ with a target nucleic acid (10 nM 5' -GCGTAGGC-f luorescein) . 

Hybridization was performed in 6X SSPE. 0.1% Triton X-100 at 15 C for 
15 min (C) is a fluorescence image showing hybridization with a 
second' nucleic acid (10 nM 5'-GCGAAGGC) added to the 
solution of (B) . (D) is a fluorescence image showing hybridization 
" results after (1) high temperature dissociation of ^^"-""/"^^^^^^ 
targets from (C) ; and (2) incubation of the substrate with a target 
nucleic acid (10 nM 5'-GCGAAGGC) at 15-C for 15 mih. (E) is a 



fluorescence image showing hybridization with a second nucleic acid 
(10 nM 5'-GCGTAGGC) added to the hybridization solution of (D) ; 

Fig. 4 illustrates combinatof ial synthesis of 4^ 
tetranucleotides . In round 1, one-fourth of the synthesis area is 
activated by illumination through mask l for coupling of the first 
MeNPoc-nucleoside (T in this case). In cycle 2 of round 1, mask 2 
activates a different one quarter section of the synthesis substrate, 
and a different nucleoside (C) is coupled. Further lithographic 
subdivisions of the array and chemical couplings generate the 
complete set of 256 tetranucleotides; 

Figs. 5A and 5B illustrate hybridization to an array 
of 256 octanucleot ides . Fig. 5A is a fluorescence image following 
hybridization of the array with a target nucleic acid (10 nM 
5' -GCGGCGGC-f luorescein) in 6X SSPE, 0.1% Triton X-100 for 15 min . at 
15**C. Fig. 5B As a matrix de- coder showing where each probe made 
during the synthesis of S- 3 '.-CG (A+G+C+T) ^CG is located. The site 
containing the probe sequence S- 3 ' -CGCGCCCG is shown as a dark area. 
The combinatorial synthesis notation used herein is fully described 
in U.S. application Serial No. 07/624,120, incorporated herein by 
reference for all purposes.; 

Figs. 6A to 6C illustrate a technique for sequencing 
a n-mer target using k-mer probes. Fig. 6A illustrates a target 
hybridized to a- probe on a substrate. Figs. 6B and 6C illustrate 
plots of normalized binding affinity vs. mismatch position; 

Fig. 7 illustrates a fluorescence image of a 
hybridization experiment; 

_ Fig. 8 illustrates hybridization events graphically as a 
function of /^Xnge^base mismat.ch; 

Fig. 9 illustrates fluorescence intensity as a function 
of pairs of mismatches; 

Fig. 10 illustrates a fluorescence image of a single base 
mismatch experiment ; 

Figs. IIA to lie illustrate various single base mismatch 

profiles; 

Figs. 12A to 12D illustrate a process for determining the 
nucleotide sequence of an n -member (the number of monomers in the 
nucleotide) target oligonucleotide based on hybridization results 
from shorter k-member probes. In particular. Figs, 12A to 12D 
illustrate application of the present method to sequencing a 10-base 
target with 4 -base probes; 

Fig. 13 illustrates a computer system for determining 
nucleotide sequence; 

Fig. 14 illustrates a computer program for mismatch 
analysis and for determining the nucleotide sequence of a target 
nucleic acid; 



Figs. 15A and 15B illustrate wild-type and mutation 
analysis using single base mismatch profiles; 

Fig. 16 is a fluorescence image of a single base mismatch 

test; and 

Figs. 17A to 17D illustrate a technique for nucleic acid 
sequence identification. 
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Def in it ions 

Probe - A molecule of known composition or monomer 
sequence, typically formed on a solid surface, which is or may be 
exposed to a target molecule and examined to determine if the probe 
has hybridized to the target. A "core" probe is a probe that 
exhibits strong affinity for a target. An "extension" probe is a 
probe that includes all or a portion of a core probe sequence plus 
one or more possible extensions of the core probe sequence. The 
present application refers to "left" extensions as an extension at 
the 3- -end of a probe and a "right" extension refers to an extension 
at the 5- -end of a probe, although the opposite notation could 

obviously be adopted. 

Target - A molecule, typically of unknown composition or 
monomer sequence, for which it is desired to study the composition or 
monomer sequence. A target may be a part of a larger molecule, such 
as a few bases in a longer nucleic acid. 

n-Rase Mismatch - A probe having n monomers therein that 
differ from the corresponding monomers in a core probe, wherein n is 

one or greater. 

A. T. C. G. U - Are abbreviations for the nucleotides 

adenine, thymine, cytosine, guanine, and uridine, respectively. 

Library - A collection of nucleic acid probes of 
predefined nucleotide sequence, often formed in one or more 
substrates, which are used in hybridization studies of target nucleic 

acids . 



A. Synthesis 

A method for a light-directed oligonucleotide synthesis 
is depicted in Fig. 1. Such strategies are described in greater 
detail in U.S. Patent No. 5,143,854, assigned to the assignee of the 
present inventions and incorporated herein by reference for all 
purposes . 

In the light-directed synthesis method illustrated in 
Fig. 1, a surface 2 derivatized with a photolabile protecting group 
or groups (X) is illuminated through a photolithographic mask M^^ 
exposing reactive hydroxyl (OH) groups. The first (T-X) of a 
series of phosphoramidite activated nucleosides (protected at the 
5' -hydroxyl with a photolabile protecting group) is then exposed to 
the entire surface. Coupling only occurs at the sites that were 
exposed to light during the preceding illumination. 

After the coupling reaction is complete, the substrate 
is rinsed, and the surface is again illuminated through a new or 
translated mask M2 to expose different groups for coupling. A new 
phosphoramidite activated nucleoside C-X (again protected at the 
5' -hydroxyl with a photolabile protecting group) is added and coupled 
to the exposed sites. The process is repeated through cycles of 
photodeprotection and coupling to produce a desired set of 
oligonucleotide probes on the substrate. Because photolithography 
is used, the process can be miniaturized. Furthermore, because 
reactions only occur at sites spatially addressed by light, the 
nucleotide sequence of the probe at each site is precisely known, and 
the interaction of oligonucleotide probes at each site with target 
molecules _(either target nucleic acids or, in other embodiments, 
proteins such as receptors) can be assessed. 

Phot oprotec ted deoxynucleosides have been developed for 
this process including 5 ' -O- (a-methyl-6-nitropiperonyloxycarbonyl) -N- 
acyl-2' -deoxynucleosides, or MeNPoc-N-acyl-deoxynucleosides , MeNPoc- 
dT, MeNPoc-dC^^", MeNPoc-dG^^^, and MeNPoc-dA^^^ . Protecting group 
chemistry is disclosed in greater detail in PCT patent Publication 
No. 92/10092 and U.S. application Serial Nos . 07/624,120, filed 
December €l, 1990, and 07/971,181, filed November 2, 1992, both 
assigned to the assignee of this .invention and incorporated herein by 
reference for all purposes. 

Examples 

1 . Protecting Groups 

Because the bases have strong tt-tt* transitions in 
the 280 nm region, the deprotection wavelength of photoremovable 
protecting groups should be at wavelengths longer than 280 nm to 
avoid undesirable nucleoside photochemistry. In addition, the 
photodeprotection rates of the four deoxynucleosides should be 



8 



similar, so that light will equally deprotect hydroxyls (or other 
functional groups, such as sulfhydryl or amino groups) in all 

illuminated synthesis sites. 

To meet these criteria, a set of 5' -0- (af-methyl-6- 
nitropiperonyloxycarbonyl)-N-acyl-2'-deoxynucleosides (MeNPoc-N-acyl- 
deoxynucleosides) has been developed for light-directed synthesis, 
and the photokinetic behavior of the protected nucleosides has been 
measured. The synthetic pathway for preparing 5' -0' - (a-methyl-6- 
nitropiperonyloxycarbonyl) -N-acyl-2' -deoxynucleoside phosphoramidites 

is illustrated in Scheme I. 
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Scheme I 



in the first step, an N-acyl-2' -deoxynucleoside was reacted with 
1- (2-nitro-4.5-methylenedioxyphenyl) -ethan- 1-chlorof ormate to yield 
5' -MeNPoc-N-acyl -2 '-deoxynucleoside. In the second step, the 
3'-hydroxyl was reacted with 2-cyanoethyl-N.N' -diisopropylchloro- 
phosphoramidite using standard procedures to yield the 5'-MeNPoc-N- 
acyl-2' -deoxynucleoside-3' -0-diisopropylchlorophosphoramidites . 

These reagents were stable for long periods when stored dry under 

argon at 4*C. , j 

A 0 1 mM solution of each of the four deoxynucleosides . 
MeNPoc-dT, MeNPoc-dcibu. MeNPoc-dG^AC , MeNPoc-dAP^C prepared 

in dioxane. Aliquots (200 mL) wete irradiated with 14.5 mW/cm 365 
nm light in a narrow path (2 mm) quartz cuvette for various times. 
Four or five time points were collected for each- base, and the 
solutions were analyzed for loss of starting material with an HPLC 
system at 280 nm and a nucleosil S-Cg HPLC column, eluting with a 



mobile phase of 60% (v/v) in water containing 0.1% (v/v) TFA 
(MeNPoc-dT required a mobile phase of 70% (v/v) methanol in water) . 
Peak areas of the residual MeNPoc-N-acyl-deoxynucleoside were 
calculated, yielding photolysis half-times of 28 s, 31 s.. 27 s and 
18 s for MeNPoc-dT. MeNPoc-dci^u , MeNPoc-dcPAC , and MeNPoc-dA . 
respectively In subsequent lithographic experiments, illumination 
times of 4.5 min. (S't^/^''^^^^^-^'') led to more than 99% removal of 

MeNPoc protecting groups. 

in a light -directed synthesis, the overall synthesis 
yield depends on the photodeprotection yield, the photodeprotection 
contrast, and the chemical coupling efficiency. Photo).inetic 
conditions are preferably chosen to ensure that photodeprotection 
yields are over 99%. Unwanted photolysis in normally dark regions of 
the substrate can adversely affect the synthesis fidelity but can be 
minimized by using lithographic masks with a high optical density 
(5 ODU) and by careful index matching of the optical surfaces, 
condensation efficiencies of DKT-N-acyl-deoxynucleoside 
phosphoramidites to the glass substrates have been measured in the 
range of 93% to 99%. The condensation efficiencies of the MeNPoc-N- 
acyl-deoxynucleoside phosphoramidites have also been measured at 
greater than 90%, although the efficiencies can vary from synthesis 
to synthesis and should be monitored. 

2 . rouplina Efficiency Measu rements 

To investigate the coupling efficiencies 
of the photoprotected nucleosides, each of the four MeNPoc -amidites 
was first coupled to a substrate (via DWT chemistry) . A region of 
the substrate was illuminated, and a MeNPoc-phosphoramidite was 
added without a protective group. A new region of the substrate was 
then illuminated; a fluorescent deoxynucleoside phosphoramidite 
(FAM-phosphoramidite Applied Biosystems) was coupled; and the 
substrate was scanned for signal. If the f luorescently labeled 
phosphoramidite reacts at both the newly exposed hydroxyl groups 
and the previously unreacted hydroxyl groups, then the ratio of 
fluorescence intensities between the two sites provides a measure 
of the coupling efficiency. This measurement assumes that surface 
photolysis yields are near unity. The chemical coupling yields using 
this or similar assays are variable but high, ranging between 80-95%. 

in a separate assay, the chemical coupling efficiencies 
were measured on hexaethyleneglycol derivatized substrate. First, 
a glycol linker was detritylated ^nd a MeNPoc-deoxynucleoside-0- 
cyanoethylphosphoramidite coupled to the resin without capping. 
Next a DNTT-deoxyucleoside-cyanoethylphosphoramidite (reporter- 
amidite) was coupled to the resin. The reporter-amidite couples 
to any unreacted hydroxyl groups from the first step. The trityl 
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effluents were collected and quantified by absorption spectroscopy. 
Effluents were also collected from the lines immediately after the 
MeNPoc-phosphoramidite coupling to measure residual trityl left in 
the delivery lines. In this assay, the coupling efficiencies are 
measured assuming a 100% coupling efficiency of the reporter-amidite . 
The coupling efficiencies of the MeNPoc-deoxyribonucleoside-0- 
cyanoethylphosphoramidites to the hexaethyleneglycol linker and the 
efficiencies of the sixteen dinucleot ides were measured and were 
indistinguishable from DMT-deoxynucleoside phosphoramidites . 

3 , Spatially Directed Synthesis of an Oligonuc leotide Probe 
To initiate the synthesis of an oligonucleotide probe, 
substrates were prepared, and MeNPoc-dC^^^- 3 ' -0-phosphoramidite was 
attached to a synthesis support through a synthetic linker. Regions 
of the support were activated for synthesis by illumination through 
800 X 128a apertures of a photolithographic mask. Seven 
additional phosphoramidite synthesis cycles were performed (with 
the corresponding DMT protected deoxynucleosides ) to generate the 
S - 3 ' - CGCATCCG . Following removal of the phosphate and exocyclic 
amine protecting groups with concentrated NH4OH for 4 hours at 
room temperature, the substrate was mounted in a water jacketed 
thermostatically controlled hybridization chamber. This substrate 
was used in the mismatch experiments referred to below. 

B . Hybridization 

Oligonucleotide arrays can be used in a wide variety of 
applications, including hybridization studies. In a hybridization 
study, the array can be exposed to a receptor (R) of interest, as 
shown in Fig 1. The receptor can be labelled with an appropriate 
label (*), such as fluorescein. The locations on the substrate where 
the receptor has bound are determined and, through knowledge of the 
sequence of the oligonucleotide probe at that location one can then 
determine, if the receptor is an oligonucleotide, the sequence of the 
receptor. 

Sequencing by hybridization (SBH) is most efficiently 
practiced by attaching many probes to a surface to form an array in 
which the identity of the probe at each site is known. A labeled 
target DNA or RNA is then hybridized to the array, and the 
hybridization pattern is examined to determine the identity of all 
complementary probes in the array. Contrary to the teachings of the 
prior art, which teaches that mismatched probe/target complexes are 
not of interest, the present invention provides an analytical method 
in which the hybridization signal of mismatched probe/target 
complexes identifies or confirms the identity of the perfectly 
matched probe/target complexes on the array. 



Arrays of oligonucleotides are efficiently generated for 
the hybridization studies using light -directed synthesis techniques. 
AS discussed below, an array of all tetranucleotides was produced .n 
sixteen cycles, which required only 4 hours to complete. Because 
combinatorial strategies are used, the number of different compounds 
on the array increases exponentially during synthesis, while the 
number of chemical coupling cycles increases only linearly. For 
example, expanding the synthesis to the complete set of 4 (65.536) 
octanucleotides adds only 4 houra (or less) to the synthesis due to 
the 16 additional cycles required. Furthermore, combinatorial 
synthesis strategies can be implemented to generate arrays of any 
desired probe composition. For example, because the entire set of 
dodecamers (4^2) can be produced in 48 photolysis and coupling cycles 
or less (b" compounds requires no more than b x n cycles) . any subset 
of the dodecamers (including any subset of shorter oligonucleotides) 
can be constructed in 48 or fewer chemical coupling steps. The 
number of compounds in an array is limited only by the density of 
synthesis sites and the overall array size. The present invention 
has been practiced with- arrays with probes synthesized in square 
sites 25 microns on a side. At this resolution, the entire set of 
65 536 octanucleotides can be placed in an array measuring only 
0.64 cm2. The set of 1.048,576 dodecanucleotides requires only a 
2 56 cm2 array at this individual probe site size. 

The success of genome sequencing projects depends on 
efficient DNA sequencing technologies. Current methods are highly 
reliant on complex procedures and require substantial manual effort. 
SBH offers the potential for automating many of the manual e"°"-J^" 
current practice. Light -directed sytnhesis offers an efficient means 
for large scale production of miniaturized arrays not only for SBH 
but for many other applications as well. 

Although oligonucleotide arrays can be used for primary 
sequencing applications, many diagnostic methods involve the ^-^^^^^^ 
of only a few nucleotide positions in a target nucleic acid sequence. 
Because single base changes cause multiple changes in the 
hybridization pattern of the target on a probe array, the 
oligonucleotide arrays and methods of the present invention enable 
one to check the accuracy of previously elucidated DNA sequences, or 
to scan for changes or mutations in obtain specific sequences within 
a target nucleic acid. The latter<q^i^ important, for example, for 
genetic, disease, quality control, and forensic analysis. With an 
octanucleotide probe set. a single base change in a target nucleic 
acid can be detected by the loss of eight perfect hybrids, and the 
generation of eight new perfect hybrids. The single base change can 
also be detected through altered mismatch probe/target complex 
formation on the array. Perhaps even more surprisingly, such single 
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are used to actually simplify the analysis ^.^^^^^d 

The high information content of light -directed 
lioonucleotide arrays greatly benefits genetic diagnostic testing. 

rence c::^ - — - rrir 

can be assayed simultaneously instead of in a — ^--^^"""^//^^f ' 
Irrays can also be constructed to contain genetic markers for the 
rapid identification of a wide variety of pathogenic organisms. 
Ind to study the sequence specificity of RNA/RNA, RNA/DNA. 
proteW^'Vr protein/DNA. interactions. One can use non Watson- 
Crier oligonucleotides and novel synthetic nucleoside 
an isense triple helix, or other applications. -^"^^^^J^^//,^^^^^^^ 
La monomers can be employed for RNA synthesis, and a wide variety of 
"^thetic and non-naturally occurring nucleic leT^ , , 

used, depending upon the motivations of the P"=""°""^^ ' 
per patent Publication Nos . 91/19813. 92/05285. and 92/14843 
per patent reference In addition, the oligonucleotide 

incorporated herein by reterence. ^" 

arrays can be used to deduce thermodynamic and kinetic rules 
arrays can «^«hilitv of oligonucleotide complexes, 

governing the formation and stability or oiigo 

Examples 

1 ., ,v...^...rion ^" ^-rf^ro Ql igonnrl eot ides 

The support bound octanucleotide probes discussed 
above were hybridized to a target of 5 'GCGTAGGC- fluorescein in 
the hybridization chamber by incubation for 15 minutes at 15 
The array surface was then interrogated with an epi fluorescence 
The array sui: ^^^-i rar ion) The fluorescence image of 

microscope (488 nm argon ion excitation) . The oattern 
this scan is shown in Fig. 2. The fluorescence intensity P«"«^" 

ches the 800 X 1280 .m stripe used to direct ^^^^^^ ^Ir 

probe Furthermore, the signal intensities are high (four times o 
the background of the glass substrate) . demonstrating specific 
— Of the^target^to -^P.obe^^^^_^_ -^----ra 
by increasing the temperature of the hybridization solution^ After 
7o minute equilibration at each temperature, the 

scanned for signal. The duplex melted in f^^^^^^T.^^^^^^^^^ rule 
expected for the sequence under study (T^-28 C obtaine 
T -(2MA+TJ>4<>(G*C)1) . The probes in the array were stable to 
Tn^-[2 (A+XJ+4 " tarqet-probe complex as demonstrated 

temperature denaturation of the target prouc f 

by rehybridization of target DNA. 



TO den.onstr.te the sequence specificity of target 
nv.riaiz.tio„, two different pro.es were synthesized in aoo x ^eo .n. 
.tripes. Pi9. 3* identifies the location of the - J/;^^, 

^v,o Q I'-CGCATCCG was synthesized m stripes i. 3 and b. y 
probe S-3 -Q.tii_An-\_o j . „ a anrt e. Fia 3B shows 

3.3..™ -as sv^^^^^^^^^^^^ tUet to the 

:rtra:":t°L':r'ri:h:U the pro.es differ ^ o„iy one ihternai 



IZZ: target hy.riai.es specificai.y to its ==^--«- ^^^^ 

sequence .-SCO counts a.ove "^'^ =a^„, "to ou^ts, . 

little or no detectable signal in positions 2, 4 ana 6 I 
li= 3C sho«s the results of hybriaization with targets to both 
Pig. 3C Shows t positions in Fig. 3C illustrates that 

sequences. 3^ is aue solely to the instability 

:rt:r:rn:irb:rris::t:h^- .ithough the ^«-^^=;-rr: l . 

. _ va^^(-» of sianals in stripes ^, ■» 

eauimolar concentrations, the ratio or sig in 
equimoxa T fi times higher than the signals in 

in Fia 3B are approximately 1-6 times aiyn t 

r 3 and 5 This duplex has a slightly higher predicted T„ 
regions l, 3 and 5 . xnis ^ , ^ Tj^e duplexes were 

Chan the duplex comprising regions 2. 4 and 6^ The p 
dissociated by raising the temperature to 45 C for 15 m 
the hybridizations were repeated in the reverse 
3EJ , demonstrating specificity of hybridization in the reverse 



direction . 



^ c-vmrhesis o f. and "y'-^^^f'i nation 

3. rnmbTnatorial Synthesis or^ ^^y^^ Matri ~ 

of a Nurleic A fid Target co, — a iri.^ 

in a light-directed synthesis, the location 
and composition Of products depends on the pattern of illumination 
and the order of chemical coupling reagents (see Fodor et 
science U991) 2^:767-773. for a complete description) Consider 

! • 7Tc:6 tetranucleotides, as illustrated in Fig. 4. 
the synthesis of 256 ^-^--^^^^ su^stra.e surface for coupling 

thrri;:r:f ou n^^^^^^ the first round Of synthesis^ 

r cycle "as. 2 activates a different quarter of the substrate for 
in cycAc ^. , Th*. nrocess is continued to 

coupling with the second nucleoside. The process is 

1« r^o'se" r'l Tna ei: rie of rouna = 
Tenrrat s"our"„ew"dinucleotide.. The proce.s continues through 

round 2 to form sixteen dihucleotides as '"""""^ '"/'^ ' ^^'^ 
Ls.s Of round 3 further hdiv^e th.^y„thesls^r.i^^^^ so^that 

ru^rtraLTs^Lrirura^thtUhTouna , to for„ the 

^synthesis of this probe matrix can be compactly represented in 
^:y:rial notation as ,..C*G.T,^ . Expansion of this polynomial 
yields the 256 tetranucleotides . 



The applicacion of an array of 256 probes synthesized by 
lioht-directed combinatorial synthesis, to generate a probe matrix xs 
illustrated in Fig. 5A. The polynomial for this synthesis -^J-^^- 
by 3'-CG(A+G+C+T)4cG. The synthesis map is given in Fig. 5B. 
possible tetranucleotides were synthesized f landed by CO 
and 5- -ends. Hybridization of target 5 ' -GCGGCGGC-f luorescein to thxs 
array at IS-C correctly yielded the S-3 ' -CGCCGCCG complementary probe 
as the most intense position (2.698 counts). Significant intensity 
was also observed for the following mismatches: S - 3 ' - CGCAGCCG 
.554 counts). S-3.-CGCCGACG (317 counts). S-3 ' -CGCCGTCG (272 countsK 
S-3'-CGACGCCG (242 counts). S - 3 ' - CGTCGCCG (203 counts). S-3 -CGCCGCCG 
1X80 counts). S-3'-CGCIGCCG (163 counts). S-3 • -CGCCACCG (125 counts), 
and s-3' -CGCCTCCG (78 counts). 

r M-tcma rrh Analysis 

The arrays discussed above can be utilized in the present 

method to .determine the nucleic acid sequence of an oligonucleotide 
of length n using an array of probes of shorter length • /^^^ 
illustrates a simple example. The target has a sequence ^ -^^^'^ ' 
where X and Y are complementary nucleic acids such as A and T or C 
and G For discussion purposes, the illustration in Fig. 6 is 
simplified by using only two bases and very short ^^""^^f " 

technique can easily be extended to larger nucleic acids with, for 

example, all 4 RNA or DNA bases. ^ 

The sequence of the target is. generally, not known 
ab initio , one can determine the sequence of the target using the 
;;e;;;^ethod with an array of shorter probes. In this example an 
array of all possible X and Y 4-mers is synthesized and then used to 
determine the sequence of a 5-mer target. 

initially, a "core- probe is identified. The core probe 
is exactly complementary to a sequence in the target using the 
mismatch analysis method of the present invention. The core probe is 
identified using one or both of the following criteria: 

1. The core probe exhibits stronger binding 
affinity to the target than other probes, typically the 
strongest binding affinity of any probe in the array 
(that has not been identified as a core probe in a 
previous cycle of analysis) . 

2. Probes that are mismatched with the target, 
as compared to the core probe sequence, exhibit a 
characteristic pattern, discussed in greater detail 
below, in which probes that mismatch at the 3'- and 

5. -end of the probe bind more strongly to the target than 
probes mismatched with the target. 



in this particular example, selection criteria «l identifies a core 
4-mer probe with the strongest binding affinity to the target that 
has the sequence 3'-VYXY. as shown in Fig. 6A. where probe^ 
illustrated as having hybridized to the target. 

(corresponding to the 5'-XXYX position of the target) is, therefore. 

rhosen as the "core" probe. 

Chosen ^^^^^^.^^ ^^^^^^.^ ,2 is utilized as a "checlc" to ensure 

the core probe is exactly complementary to the target nucleic acxd. 
Z second selection criteria evaluates ^V-idization dat (such as 
the fluorescence intensity of a labeled target hybridized to an ^ray 
orprobes on a substrate, although other techniques are well Known to 
those ofsKiU in the art) of probes that have single base mismatches 
as co-pared to the core probe. In this particular case, the core 
as comparea ^ ^..vyxy The single base mismatched 

probe has been selected as S-3 YYXY. me y 

v_ CI' YYXY S-3'-YXXY, S-3 -III Jt, ana 

nrobes of this core probe are: S-3 -XYXY, i> J . . . ^ ^ 

s-3 !yy^. The binding affinity characteristics of these single base 
mismatches are utilized to ensure that a "correct" 
selected, or to select the core probe from among a set of probes 

exhibiting similar binding affinities. 

exhibit g .^^^^^^^^.^^^ ^yp^,i,etical plot of expected binding 

affintity versus mismatch position is provided in ^ -.H' 
binding affinity values (typically fluorescence intensity ^ 
tlrget hybridized to probe, although many other factors relating to 
a^Tnity'may be utilized) are all normalized to the 
of s-3' -YYXY to the target, which is plotted as a value of l on the 
left hand portion of the graph. Because only two nucleotides are 
involved in this example, the value plotted for a probe mismatched at 

•. ►j-i^ =f fh«» T-end of the probe) is tne 
position 1 (the nucleotide at the 3 ena oz v 

normalized binding affinity of S-3'-XYXY. The value P^°"^^- 
mismatch at position 2 is the normalized affinity of ^-3^^^. 
value plotted for mismatch at position 3 is "--"^^"^f /f/",^^ 

of S-3'-YYYY. and the value plotted for mismatch position 4 s the 
normalized affinity of S.3.-YYXX. .s noted above -ff inity^ -ay be 
measured in a number of ways including, for example, the number 
photon counts from fluorescence markers on the target. 

The affinity of all three mismatches is lower than the 
core in this illustration. Moreover, the affinity plot ^^-^^ ^TJ 
- mismatch at the 3' -end of the probe has less impact ^^^JJ^^^^f/^ 
at the 5' -end of the probe in this particular case, although this may 
not alway. be the case. Further, mismatches at the end o^J^^ .ro^^ 
result in less disturbance than mismatches at the center of the 
ZZl These features, which result in a "smile, shaped graph when 
plotted as Shown in Pig. SB. will be found in most plots of single 
base mismatch after selection of a "correct" core P-^- ^ ^/"^ 
. accounting for a mismatched probe that is a core probe with respect 



The 
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This information will be 

„,nized in elener -""'"%''!^%„,/p„be W.n selected, 
.o ensure that an ^ TJ^e. In Section B a.cve, 

Ot course, m certarn situ.t.ons,^^^ ^^^^^^^^ 

identltlcatlon of « core ,-„dles. and the like. 

example, forensic or 9-"-/"^^;^ p„„3S Is then repeated for 
I„ seo^enc.ng . In the exan^le 

left and/or right rL^nslons of the core probe are 

Uluscrated In PI9. e. "f^^^.ion probes of the core probe 

possible. The possible «-»«,^='" selection criteria are 

:„ -d It would nor^UV - jound 

utilised. Between 3- ml and t,,„ai„g affinity, and this 

Thlt would have "^J^^^^ rt^nslon. This selection 

p„^e is selected as Jf"^'^/, „c,,»ll«d binding affinity of 
be confirmed by again Pl°"'"«/ ,„^„ed to the core probe, 

probes with single base ">i^""^='^" ;^aln, the 

I hypothetical plot Is ^''■tZ '^XZJZ.. Indicating that the 
characteristic -sml." P'""" '°,,ted, i.e., 3' Fro» this 
.„„ect. extension that the sequence of the 

information, one woui«j 



target is 5'-XXYXY. 

* P->fampleS 



1. jA Vp^ Array '^"'"q^^ v,- "::r^al synthesis was 

tlo::^^^^^^^ r'^r Tne lifhographic masKs were 

performed using ^^-^^^^-"^ ^^""^ ^^''^Z. ot 256 octanucleotides was 
cnosen such that each me^er of a s^^^ ^ 

synthesized in four ^^P"^" .^^3, each containing an 

1024 different synthesis sites. Following 
yielding 1024 oil ^ ^^^^ ^^^^ size. 

octanucleotide probe, ^^""^ 'V ^ ^he dA amine, the 

synthesis and P--j;«;^;,:rrsratically regulated stainin. and 
substrate was mounted ^^"^ .^aAAAAAA- fluoresce in at 15 C and 
..ow cell. ---:ti:l eplfruoLs^^i-scope. .he resulting 



then scanned in a Zeiss ^h^"- 
fluorescent image is ^^°^;"/^t,ies'of the hybridization events as a 

Fluorescence -"^^^^j^'^^// ^^^.^.^ graphically in Fig. 8. 
.unction Of single base ^^^^^^^^^.^^ for each octanucleotide 
Each of the four independent ^""""^ ^ase is plotted. 

u . Aifters from the core probe at a sing , , .,nTT-rrrT) is 

probe that differs tr perfect complement 3 Trri 

position zero mismatch (i.e^. the P ^^^^^ ^ background 

the brightest position on the ^ ^^^„,s . Mismatch position 

Signal of this array is ^-^^°;XZZ. brightest at -760 counts. 
, 3. -end of ^^^^^J //.^e fol lowing positions indicates 

. «smile- or I^rtches at each position of .he 

the relative stability ot tn 
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probe/target complex. This "mismatch family" character ist.zes 
nucleic acid interaction with an array of probes and provides or 
con ir- the identification of the target sequence. The mismatches 
at positions 3 , 4 , 5 and 6 are more destabilizing and yxeld 
intensities virtually indistinguishable from background 
The mismatch at position l (the point where the '-end o^^^^ ^ 
octanucleotide is tethered to the substrate) is less destablxz ng 
than the corresponding mismatch at position 8 (the free 5' -end) 
The uniformity of the array synthesis and the target hybridizatxon 
Is reflected in the low variation of intensities between the four 

duplicate synthesis sites. ..r^iize 

The method of the present invention can also utilize 

information from target hybridization to probes with two or more 

lismatches. ~- .^^T' ^ ^ i-nsfty data 

mismatches are presented in Fig. 9. In this cas ' 

^ ^H^r a oerfect match has intensity i. tor 
have been normalized so that a perLcuu u« 

example, the data at index l.S corresponds -^-^7"-// ^^^^ 
Of the probe/target duplex. The diagonal (index l . 1 to 8^8) 
corresponds to the single mismatches illustrated J^^J' at 
highest intensities correspond to single and pairs of mismatches 
the ends of the probe/target complex. 

2 (P,a.T^8 Arr^Y Seq "^"''^ Reconstruction 

§;ZC5-^A^ octanucleotide array of MenPoc-dG and MenPoc-dT 
was sy^?»4iszX "^^^ '^^"^^ °' ^""^ synthesis was -i-^^^^ " 



After final deprotectii>n and attacnmenu ^ ^ ^^y^ 

n\.a hybridization chJWr. the probe array was incubated -rth 
. L -»cca^CCC-nuo3^n target an. scanned 

■ ,^r, -in Pia 10 F^r distinct but overlapping, perfectly 
image is given in Fig . lu • rtr^i 

co^ie^ntary -""-^-^'^^ "^^'",1^,^' Ifrno:^' herein, the 

1. TTGGGTTT TGGGTTTG, GGGTTTGG. ahd GGTTTGGG . As snown 

Ldrr^^abltiitv o« probe/targer^iexes with sin.i. baa. pa.r 

families Of orobesNwith moderate signals, 
mismatches generates families ot proD nj- revealed 
A cursory inspection of the many intense Wures of Fig. 10 revea 

a complex pattern. ^ *.v»^ r.r-P^Qf^nt 

The reconstruction heuristic provided by the P^^^^ent 

invention effectively utilizes the complex data pattern in Fig. 10. 
;he algorithm assumes as a general rule that P^^^^^^ --^^^^^^ 
probe/target complexes have higher fluorescence intensities and 
probe/targec p -i^gie base mismatch typically form a 

perfect matches and relatea singic 

profile similar to that shown in Fig . 6 . ^ -.^ v.^ a oerfect 

The probe with the highest intensity should be a perfect 
.atch to the target. Corresponding mismatch profiles are shown in 
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Figs llA to lie. one first plots the mismatch profile for the probe 
with' the highest intensity (S-3 ' -TGGGTTTG in this ease) to verify 
that the probe is exactly complementary to the target. Assuming that 
this probe is complementary to a fragment of the target, we consider 
-extending- a base on the 3' -end of the target. In this case, there 
are two probe choices. One of the two 8-mer probes S-3' -GGGTTTGT and 
S-3' -GGGTTTGG, will be exactly complementary to the target nucleic 
acid. The mismatch profile for each of these two probes, as well as 
for probe S-3 ' -TGGGTTTG, is shown with intensity values in Fig. HA. 
Note that the probe S-3 ' -GGGTTTGG has the mismatch profile most 
similar to that of probe s-3 ' -TGGGTTTG (a typical "smile" plot). 
Therefore, one will conclude that the correct extension probe is 
S-3' -GGGTTTGG. 

Fig. llB shows repetition of this process to evaluate the 
3. -end of the target sequence. Because the probe S-3'-GGTTrGG has a 
smile-shaped mismatch profile most like the core s- 3 ' -GGGTTTGG, and 
because the probe S-3' -GGTTTGGT does not, one will correctly conclude 
that the probe s- 3 ' -GGGTTTGG is the correct extension probe. This 
process can be repeated until neither profile has the correct shape, 
or the absolute intensity is well below that of the highest 
intensity, indicating that the "end" of the target has been reached. 
A similar method provides the sequence of the target extending to the 
5' -end. Fig. llC shows the mismatch curves for all the perfectly 
matched probes; each curve has the consistent shape predicted for 

this target . j -, 

The techniques described above can of course be readily 

extended to nucleic acids of any length, as illustrated in the 
various panels of Figs. 12A to 12D. As shown in Fig. 12A. a 10-mer 
target is to be sequenced, and the sequence is indicated by 
5'-NiN2N3N4N5N6N7N8N9Nio-3'. where N is any nucleotide or nucleic 
acid monomer, and the subscript indicates the nucleotide position in 
the probe, with 3 indicating the 3' -end terminal monomer. Those of 
skill recognize that, if the probes were synthesized with the 5' -end 
attached to the substrate, the method of the invention can be applied 
with appropriate modification. 

An array of shorter oligonucleotides can be used to 
sequence a larger nucleotide according to one aspect of the present 
invention. In the particular example shown in Figs. 12A to 12D, 4- 
mers (oligonucleotide probes 4 monomers in length) are used to 
sequence the unknowa lO-mer target. In practice, longer probes and 
targets will typically be employed, but this illustrative example 
facilitates understanding of the invention. A single member of the 
4-mer array is shown in Fig. 12A and has the sequence S-3 ' -P3P4P5P6 . 
where the various P (probe) nucleotides will be selected from the 
group of A, T, C. U. G, and other monomers, depending on the 
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«nd the subscript indicates position relative to the 
application, and the suDscrip hvbridization data are presumed 

target. For discussion purposes, the ^^^^^^"^^ 

- - : ^rra:s"sr--d-at"r;n: ti.es. or even 

- .ethod^ .s p^^e len.. 

of 4 is selected to facilitate discussion; in practice, 

will ^yP^-^^/ 3,,ected as a core probe fro. the array 

S-3 -P3P4P5P6 K.nrtina affinity to the target and 

aue .0 its exHiMticn of, ^"/^^^ .he se,uenc. 

a correct mismatch profile. In the array ^,„„s. „Ben a 

fluorescein-labeled target ^ ^ ^yl taraet hybridizes to the 

P.,. is --"-f ;°;,:%:r.rr„%i i rth/hi^h »uore3ce„ce 

pro.e as ^"^-"f ^7^;;:;^;;:, p^otL co„„ts, is observe, in Che 
intensity (i.e., a latyc -ti ' -P^p^ PcPc * as 

portion o. -'^r'-ronrrrhnrtrr^ ^tnt se..ence 

:rrinrtre"s'trr„::rMn.in, a»i„itv -i^i ^ -osen as the first 

core se<,uence^ verifies whether the first "'""^ ~" 

V taraet by examining tne 

r^oY-fect complement to tne cargeu 
sec^uence ^^/^^"/^"/,'^/,/p,,^es in the array that differ from the 
fluorescence -^-"-^J ^ qualitatively illustrates a 

core probe at a single ^ ^^3^ n>ismatches versus 

1 /-if relative intensity or s>i»iyj.'= 

typical plot of reiat 3..?,?,?.,?. core probe. As a 

position of the mismatch for the S-3 6 ^^e 

i-H5*t- in the sequence S-J - *^3*^4'^5^6 ' 
simple example, assume that in th ^ , qualitative 

nucleotide C is not present. Fig- 12B 11 ^^^^ 

w „«r-maiized fluorescence intensity of probes 
way, the normalizea ixuu . „ ^* r into the sequence 

rrr. rrnirn r„r:rrcr„trn. .is„atcne. .o^es 



!s iactl/cUlementarv to another se^ence in ^^^'J"^^- 
Iccor^ingly. Fi9. i^B plots the relative fluorescence intensity 



the probe set : 

S-3' -CP4P5P6' 
S-3' -P3CP5P6' 
S-3' -PsPaCPs- and 
S-3'-P3P4P5C 



„he„ they ar:" hy^rLt.^a to the target, ncr.ali.e. to tne. or. 

=^^e■rnative embodiments, average curves are p 
probe. in alternative emeu ..^^^ =r each position (the 

substitution Of all the possible Jt^^3^3i,y is plotted 

-families- of mismatched probes), or the ^J^^^-^/^/^f^^^^^.e graph 
for each position. Thus, the 0 position °"/7J/,"/,,,,,escence 
in Fig. 12B represents no substitution and shows "'^ 

nten'sity due to target hybridization to core PJ^-/;^; 4j5^6^ 
Because all values in Fig. 12B are normalized with respe 



value, the "no substitution" case has a normalized intensity of l. 
When C is substituted at the 3, 4, 5, and 6 positions, the relative 
intensity values are normally less, because none of these sequences 
are exactly complementary to the target in this example. 

The relative fluorescence intensity of a probe/target 
complex with a mismatch at the 3'- or 5' -end is typically higher 
than complexes with mismatches in the center of the probe/target 
complex, because mismatches at the end of the probe tend to be less 
destabilizing than mismatches at the center of the probe/target 
complex. Probe/target complexes with mismatches at the 3' -end of 
the probes may impact hybridization less (and thus have a higher 
fluorescence intensity) than those with mismatches at the 5' -end of 
the probes, presumably due to the proximity of the 3 '-end of the 
probe to the substrate surface in this embodiment. Therefore, a 
curve plotting a normalized factor related to binding affinity versus 
mismatch position, tends to have the shape of a "crooked smile," as 
shown in Fig. 12B. 

Using this methodology, one can extend a core sequence by 
examining probes on the array that have the same sequence as the core 
probe except for having been extended at one end and optionally 
shortened at the other. These probes are evaluated as candidate 
second core sequences to determine which probes are perfectly 
hybridized to the target. By repetition of this process, one can 
determine the complete nucleotide sequence of the target. 

To illustate the method. Fig. 12C shows the 4 possible, 

4- member "left extensions" of the core probe S-3' -P3P4P5Pg . 

As shown, the nucleotide adjacent to the sequence of the target 
complementary to S-3 ' -P3P4P5Pg is either A, T, C, or G, or there is 
no adjacent nucleotide on the target (i.e., P3 is complementary to 
the 5' -end of the target). Therefore, the possible left extensions 
of the P3P4P5P6 core probe are probes S-3 ' -AP3P4P5, S-3 ' -TP3P4P5, 

5- 3'-CP3P4P5, and S-3 ' -GP3P4P5.. For the purposes of this 
illustration, T is assumed to be actually "correct," as A is in the 
complementary position in the target nucleic acid. 

The upper left hand plot in Fig. 12D illustrates 
predicted hybridization data for the mismatch profile of the 
S-3'-AP3P4P5 probe, with all data normalized to S- 3 ' -AP3P4P5 . Data 
points for all substitutions at each of the 2-5 positions are shown, 
but the average data for the three substitutions at each position 
could also be utilized, a single substitution at each position can be 
utilized, the highest of the three values may be utilized, or some 
other combination. As shown in the S-3'-AP3P4P5 graph, one point 
shows much higher binding affinity than the rest. This is the T 
substitution for A at position two. The remaining data in the 
AP-iP4Pc graph have the normal "smile" characteristics shown in 
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=^ foT- the C and G substitutions 
.23. Similar plots are -^^^f J^^^eacn case, all 
Shown in the botto. P-"- °^ ^^^^^^.^ „,ore" probe in the graph. 

datapoints are "^^""^^""^^"^ in the upper right hand 

The T extension graph. ^.^^ ^^.e 

portion o. .ig. -O^ --^/^ra^se n^y the .onosubsti tut ions at 
3..AP3P4P5 9rapn and others bee ^^^^ ^^ ^^^^^^^ eo.plen,entary to 
position 2 of the ^ "^^3^^^/ J e. and G at pos.txon 2 

the targets. Accordingly. ^^'^^^ „ predicted for probes 

... produce the te .o'the target. Xn addition, the 

with single base --"^^^'^^^/^ substituted probe/target complex will 
nuorescence j'.^H .^'escence intensity of the C. G. and A 

normally be higher than ^^^J^^l in various 

probe/target complexes. These ^^^ensions is "correct- 

con^inations to determine which of the ex ^^^^^^ ^^^^^^^ ^^^^ 

and thereby determine the sequence ^^^^ concludes 

From the data shown in ^'^^'^l^ extension of the 

.hat the probe --///^/"m;^^ sequence has an A 
target relative to the core p 

.onomer at position 2 in ^^^^^^J/^^ ^f the graphs have 

This process is "P®^^ ^.^^ is concluded that an 

appropriate ^^i^^^r.^ed^ Similarly, right extensions are 

end of the target has been r sequence of 

evaluated until the end of the targ 

interest) is reached. obviously be conducted through 

The above techniques can However, in preferred 

^■^r. r^f rhe hybridization data. However, 
manual observation of the hyo ^^^^ appropriately 

embodiments the data are analyzed using^o^ ^^^^^^ ....strated 

programmed digital ^^'"P""";^ ^^^.^^ includes a computer or 

in Fig. 13- AS -^-";^"//";^nontrol of a CPU and including 
computers 302 °Pe""- j/-^^;; memory 306. such as dynamic 

memory 304. such as a hard ^^sK ^^^^^ ^ scanning 

random access memory, ^^^^^^/f;;;/,;;,;,: intensity or other related 
device 30S that ---//^/^t/.'.rrcleotide coupled to portions of a 

information from a f /'/^^^„„i,s probe nucleotides of known 
substrate ^ . The -bstrate .2 conta p^ ^^^^^^ ^^^^^ ^. ^ ^^^^^ 

sequence at known locations thereon. 

devices 313. intensity or other related information is 

Fluorescence intensity « fluorescence 
.s/N^ /in*; CPU 310 processes the tiuoi.c=> 
stored m the ™mory 304/306 . c ^^^^ dlspl.y 

aaua to provide output " .ethod. described 

3». The date are P-^^/f^^^^j'^^'phs such as those shown above 
herein, and "^^^^ _Lers. or in si^pie °-PUt. 

Tr Ithrrsu tt'o tht analysis o. such data «av be obtained. 



include for example, an IBM PC or compatible, a 
Suitable computers include, tor 

SPARC v,orkstation. or ^ typical computer program 

Fig. 14 is a flowchart sequence of an 

used to evaluate an array of n-mers an^^-en-^ ch^ ^^^^ 

exactly complementary --^^/^^^J^TheUin. the system first 

sequencing or other ,or example, selecting a 

identifies a core probe ^^^^ep Y specified set of 

p.obe having the ^ t "n be operational in iterative 

probes. The present p.^be in the array is not 

selected after the fxrst .t ^^^^ ^^^^ ^^^^^^ 

worthwhile, for example, to sex extensions on 

strongest binding probes and ^^^^^tor^jZ l^^,, .,,er data before 
each, then store and compare ^^^r^;^':^; confirmation of the 
providing the final output. The result 

correct sequence. ..^^nrifies all left extensions of 

At step 404 the system ^^^^^^^^^^ ^ appropriate left 

. At step 4 06 the system selects the approp 

the core n-mer. At such 

extension by one or both of: extensions exhibits 

. determining which ot tne le 

^„ = -icrent with a preset monomer 
the behavior most consistent witn 

substitution pattern, and/or ^^^ibiting the 

- selecting the left extension exhibiting 

highest binding ^"^^"^^^^ ^^^^ embodiments be 

.ne above selection criteria ^^//^^ J^,,,,,,,, ..st be met or the 
used in an AND fashion. - ^ - ^^^^^^^ terminal monomer or 

system assumes that one has eit alternative embodiments, 

the system is not P«^^°™^ ^"'f /^^"^^ p^in^ary selection mechanism, 
one Of the c----av ^ ^ -^de .h: user w^th warnings, 
and the other may be used to P ,,ternate selections. 

potenti..lV incorrect "^-"-^ ;y;;^,„„ if the selection 

Thereafter, the system u aqp If not, then 

criteria Have ^t — --/rre^tUnfe has been reached at 
the system assumes that the , then the process 

step .10. X. the " th the new -core- selected as the 

is repeated beginning at step *w 

correct extension from jrevi^^^^^^ fs^ef f ectively repeated for 

Thereafter, the process ^^^^^^^ identified. At 

right extensions. At step 412 rig identified and/or high 

. step 4X4 a preset mismatch prof ile, probe^i ^^^^^^ ^^^^^^^^^^ 

affinity right " J.eached. If not. then the process 

terminus of the molecule has bee ^^^^ ^^^^^^ ^^^^^^^ ^^^^ 

is repeated to step 412. If ^ terminated with 

.olecule has been ^^^-^^ll^'J^^^^l^^,, output device. , 
5 appropriate output to a printer or 



0. AEElisatiSns H.ccribed herein will have a wide range of 

The techniques described ner ^^^ermine if a target 

so.e other 

rerceTi:frin: ^— rnTs%o;n:%r^^^^^^^ 

application of the ----"^^^^f;: ,,,3 variety of fields including 
These techniques may be ^PP^J^^^ ^ ^^^.^rs. 

diagnostics, forensics ,,.,^e- nucleic acid has the 

For example, assume ^ ^.onomer such as a 

sequence S' -^3^. '''^"".^.Tnd the subscript refers to position 
nucleotide in a nuclexc acid ^J^^ . ^ evaluated to 

number. Assum. that a if it differs from this 

determine if it is the same ^^J l^l^' J^,,^, sequence. The target 
sequence, and so contains a °J typically shorter 

nucleic acid is initially -P-f^/^^/;, "/n^or more "core- sequences 
p.obes. as discussed ^^J^^ ;,pected to have a high 
are identified, each of which wou ^^^^^^ ^^^^ ^ 

binding affinity to the target ,,,13, example, one probe 

n^utant sequence or " J^^inding affinity would be the 

..at would be expected - --^/^ J ^^..^^^.^^ a B-mer array is 
complement to - -^«^^^3 ^-x/^f^^^;^^, ,,,, ,,e probes and/or the 
utilized, ^^ain. ^'^^V ^ acid molecule, 

target may be part °^ . ..e absolute binding 

AS an initial ^ be utilized to 

affinity of the target to ^"^^ ' J^Jl^' .^rget are of the 
determine if the first -"^J^;;;;-,: .,,.,^3 does not exhibit 
expected sequence. properly concluded that the 

strong binding to the target. 

target is not of the -i^^'^^P^ _,iie can also be utilized 

The single base mismatch prot ^ 

according to the present -mention to d er-^ -J^^^^ 

contains a mutant or -^^^-^^^^^^"..^../ing from targets that are 

illustrate typical i^^^-^^^^ ^^f/ "b^) . As shown, the single 

wild- type (Fig. XSA) and -"fj'^^.^i' J generally follow the 

.ase mismatch plots ^^en the target has . 

.ypical. smile-Shaped plot. .^J^J^^^J; ,i,, absolute binding 

mutation at a particular P°""°"^;;^°^ ° J, p.^be be less, but the 
affinity of the target to a P"--//^^=^^ J^,^^, ,,om expected 
single base mismatch characteristics 

' According to one aspect Of the i^^^^^^^^^^^^ 

naving a selected group of ru^d ^ the determination 

.erein as 3 "library" of "///^^/^s the same or different than a 
Of whether a particular .^.^ries of nucleic acids 
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of whetner ^ h"-* ^r^Ac\ 

v ^fher expected nuclexc acid 
wild- type or other expe^ 



will normally be provided as an array of probes or "probe array, 
such probe arrays are preferably formed on a single substrate .n 
Which Che identity of a probe is determined by ways of its location 
on the substrate. Optionally, such substrates will not only 
determine if the nucleotide sequence of a target is the same as the 
wild-type, but it will also provide sequence information regarding 
the target. Such substrates will find use in fields noted above such 
as in forensics, diagnostics, and others. Merely by way of specific 
example, the invention may be utilized in diagnostics associated with 
siclcle cell anemia detection, detection of any of the large number 
of P-53 mutations, for any of the large number of cystic fibrosis 
mutations, for any particular variant sequence associated --^^ 
highly polymorphic HIA class 1 or class 2 genes (particularly class 2 
DP DQ and DR beta genes, . as well as many other sequences associated 
with genetic diseases, genetic predisposition.' and genetic 

evaluation. . . 

When a substrate is to be used in such applications, it 

is not necessary to provide all of the possible nucleic acids of a 
particular length on the substrate. Instead, it will be 
using the present invention to provide only a relatively small subset 
of all the possible sequences. For example, suppose a target nucleic 
acid comprises a 5-base sequence of particular interest and that one 
wishes to develop a substrate that may be used to detect a ^^^^le 
substitution in the 5-base sequence. According to one aspect of the 
invention, the substrate will be formed with the expected 5-base 
sequence formed on a surface thereof, along with all or most of the 
single base mismatch probes of the 5-base sequence. Accordingly, 
it will not be necessary to include all possible 5-base sequences 
on the substrate, although larger arrays will often be preferred, 
lypically, the length of the nucleic acid probes on the ^f^^^^^ 
according to the present invention will be between about ^ ^nd 100 
bases, between about 5 and 50 bases, between about 8 and 30 bases, or 

between about 8 and 15 bases. 

By selection of the single base mismatch probes among all 

possible probes of a certain length, the number of probes on the 

-, -1 •-•►^.^ Prt-r <>xamnle. in a 3 -base sequence 
substrate can be greatly limited. For example, in a , 

there are 69 possible DNA base sequences, but there will be only one 
exact complement to an expected sequence and 9 possible single base 
mismatch probes . By selecting only these probes, the diversity 
necessary for screening will be reduced. Preferably, but not 
necessarily, all of such single base mismatch probes are synthesized 
on a single substrate. While substrates will often be formed 
including .other probes of interest in addition to the single base 
mismatches, such substrates will normally still 

all the possible probes of n-bases. often less than 30% of all the 



possible probes of n-bases. often less than 20% of all the possible 
probes Of n-bases. often less than lOV of the P-^^^^-^^/^/^^^^ 
n-bases. and often less than 5% of the possible probes of n-base ^ 

Nucleic acid probes will often be provided in a 

analysis of a specific genetic sequence. ^^^^^^'^^ ^JJZcTTc ZT^ 
the kits will include a probe complementary to a target nuclexc acxd 
of interest. In addition, the kit will include single base 
mismatches of the target. The kit will normally include o 
of C G. T, A and/or U single base mismatches of such probe. Such 
Tits wi 1 often be provided with appropriate instructions for use of 
he complementary probe and single base mismatches i"/— -"^^^^^ 
e.uence of a particular nucleic acid sample --^^^"^ J^^he "it 
teachings herein. According to one aspect of the -^--^^ 
provides for the complement to the target, along with only ^^e single 

• . H»« such kits will often be utilized in assessing a 
base mismatches. Sucn Kits wixx ^ . < n^-i r-ai-pc. a 

particular sample of genetic material to determine if it indicates a 
particuxat so f examole. such kits may be 

particular genetic characteristic. For example, s . 
utilized in the evaluation of a sample as mentioned above in the 
: ion Of Sickle cell anemia, detection of any of ^-^^ 

of P-53 mutations, detection of the large number of -y^^^/^^'^^^^/ 
.utations. detection of particular variant sequence ^^^^^'^ 
the highly polymorphic HU. class l or class J-/^^: ^^^^ 

f-lass 2 DP. DQ and DR beta genes), as well as detection 
silences associated with genetic diseases, genetic predisposition. 

.nd genetic ^^^^^^ ^^^^ ^.^^ 3,,,„3tes with probes 

selected according to the present invention will be capable of 
performing many mutation detection and other functions, but will 
only a limited number of probes to perform such functions. 

Rxamples 

^ ^rt^T)e Rrrav ^"^ pif f e ir^nr i al Spouencinq 

'ALrV^7^\(G-.T) 8 array was prepared and incubated with 
1 nM 5--^6cCAkLcC-fluorescein (representing a mutant sequence when 

orpared^o S-^LaaCCC,. and scanned to "st whether the sequence 
was -wild" or "mutaJf^" The resulting image is given in Fig 16. 
pour overlapping, exa^ complementary octanucleotide P-^e/t-get 
l\ -.. ^^rion.c are expeckd if one is assuming the target should be 
hybridizations are expecc^ ttgGgttg TGGGTTGG. GGGTTGGG. and 

5' -AACCCAAACCC with probes -XS-a -TTGGGTTG. Tt^i^ . 

LrrGGGG. The results demon^ated that the effect of a^single base 
^ge is quite dramatic. espSally in the number and identity of 
Che different mismatched probe/t^W complexes that form on the 
array. If one assumes the target nWic acid generating the signal 
in Fig. 16 is 5'-AACCCAAACCC. (i .e . .Ih. wild- type) then the mismatch 
profiles for the complementary probe S-3\TrGGGnT are shown in 
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The .l.»at=>, profile does not have the expectea sh^pe. and 

^ , H?.c5 a low fluorescence intensity. Tne 

vHo^Tobe/tarqet complex nas a low ^ ^ 

stSla. corresponding to a mismatch in position 8 indicates that 
the "coSLt- base in this position in the target is probably an A, 
because onS<^ and C are found in the target in this -P^"-"':^ 
Mfsmatch Pos^n 6 also shows a small pea.. By contrast, a sxn..lar 
Tlorusin: the^e sequence S-3 ' -rrocorro P-^^ ~^ as a core 
yielded the "smileNhape and high fluorescence ^^"^ ^\ 

ng 17B the same proKle for the next 8-mer probe .s shown. The 
peaks have shifted onelLition to the left, again conf xrm.ng that 
the sequence varies from^Hd-type at position 8 in the target 
TheseTorrespond to the samXsitions in the original Xl-mer target 
These corirc y >v ^ single base change in 

fragment. These data predict tft^t tnere 

position 8 of the target, as comp^ to the wild- type. 

All of the mismatch pro^rofiles corresponding to the 
15 rtxj. w ^os^,- Fia 17C One observes 

assumed fragment 5' -AACCCAACCCC, are sh^in ^^^^ 17C^ 
the mutant position "moving- down, the seqW . Finally, in Fig^ 1 
The TsLtch Plots are shown corresponding^ 'raracferistics 
complement 5 ' -AACCCAACCCC. with the expected s.^He characteristics. 
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H. C.n.1^^ ^^^^^^^ ..mentions provide improved methods and 
devices for the study of nucleotide sequences and nucleic acid 
interactions with other molecules . The above description is 
interactions with variations of the invention 

illustrative and not ^^.^ ,pon review of this 

Tiscrur TrTy b^way Of exa:ple certain of the inventions 
descried -rein wlll'have application to other polymers such as 
aescriDcu „ other synthesis techniques, 

peptides and proteins, and can -"^-^ ^^^^^ determined not with 

fh*. scooe of the invention should, therefore, oe uc 

rfere^ce to th. .bove aescriptioh, but Instead should ^e^"-™""'^ 
"th referehce to the appended claims alon, with their full scope of 

equivalents. 
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